Eliciting Natural Speech From Non-Native Users: Collecting Speech Data For LVCSR
نویسندگان
چکیده
In this paper, we discuss the design of a database of recorded and transcribed read and spontaneous speech of semiuent, strongly-accented non-native speakers of English. While many speech applications work best with a recognizer that expects native-like usage, others could bene t from a speech recognition component that is forgiving of the sorts of errors that are not a barrier to communication; in order to train such a recognizer a database of non-native speech is needed. We examine how collecting data from non-native speakers must necessarily di er from collection from native speakers, and describe work we did to develop an appropriate scenario, recording setup, and optimal surroundings during recording.
منابع مشابه
Handling Non-native Speech in LVCSR: A Preliminary Study
In moving towards full incorporation of CSR in applications whose users include non-native speakers, an understanding of how the system can be modified to increase its tolerance to non-native idiosyncrasies such as accented pronunciation and disfluent form is essential. While experiments geared towards restricteduse systems have suggested that extremely simple techniques are effective, prelimin...
متن کاملAdaptation Methods for Non-native Speech
LVCSR performance is consistently poor on low-pro ciency non-native speech. While gains from speaker adaptation can often bring recognizer performance on highpro ciency non-native speakers close to that seen for native speakers [12], recognition for lower-pro ciency speakers remains low even after individual speaker adaptation [2]. The challenge for accent adaptation is to maximize recognizer p...
متن کاملHypothesis-driven accent discrimination
Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, w...
متن کاملExploring Pragmalinguistic and Sociopragmatic Variability in Speech Act Production of L2 Learners and Native Speakers
The pragmalinguistic and sociopragmatic aspects of language use vary across different situations, languages, and cultures. The separation of these two facets of language use can help to map out the socio-cultural norms and conventions as well as the linguistic forms and strategies that underlie the pragmatic performance of different language speakers in a variety of target language use situatio...
متن کاملSpeech-like Pragmatic Markers in Argumentative Essays Written by Iranian EFL Students and Native English Speaking Students
In this study, the use of speech-like pragmatic markers in Iranian EFL students’ academic writing was investigated. Speech-like pragmatic markers, such as I think, well, I guess, actually, anyway, anyhow, etc. are linguistic components that are more specific to conversation than writing, and writers may wrongly include them in their academic writing. To examine the students’ use of speech-like ...
متن کامل